250 research outputs found
Polar Polytopes and Recovery of Sparse Representations
Suppose we have a signal y which we wish to represent using a linear
combination of a number of basis atoms a_i, y=sum_i x_i a_i = Ax. The problem
of finding the minimum L0 norm representation for y is a hard problem. The
Basis Pursuit (BP) approach proposes to find the minimum L1 norm representation
instead, which corresponds to a linear program (LP) that can be solved using
modern LP techniques, and several recent authors have given conditions for the
BP (minimum L1 norm) and sparse (minimum L0 solutions) representations to be
identical. In this paper, we explore this sparse representation problem} using
the geometry of convex polytopes, as recently introduced into the field by
Donoho. By considering the dual LP we find that the so-called polar polytope P
of the centrally-symmetric polytope P whose vertices are the atom pairs +-a_i
is particularly helpful in providing us with geometrical insight into
optimality conditions given by Fuchs and Tropp for non-unit-norm atom sets. In
exploring this geometry we are able to tighten some of these earlier results,
showing for example that the Fuchs condition is both necessary and sufficient
for L1-unique-optimality, and that there are situations where Orthogonal
Matching Pursuit (OMP) can eventually find all L1-unique-optimal solutions with
m nonzeros even if ERC fails for m, if allowed to run for more than m steps
An open dataset for research on audio field recording archives: freefield1010
We introduce a free and open dataset of 7690 audio clips sampled from the
field-recording tag in the Freesound audio archive. The dataset is designed for
use in research related to data mining in audio archives of field recordings /
soundscapes. Audio is standardised, and audio and metadata are Creative Commons
licensed. We describe the data preparation process, characterise the dataset
descriptively, and illustrate its use through an auto-tagging experiment
A measure of statistical complexity based on predictive information
We introduce an information theoretic measure of statistical structure,
called 'binding information', for sets of random variables, and compare it with
several previously proposed measures including excess entropy, Bialek et al.'s
predictive information, and the multi-information. We derive some of the
properties of the binding information, particularly in relation to the
multi-information, and show that, for finite sets of binary random variables,
the processes which maximises binding information are the 'parity' processes.
Finally we discuss some of the implications this has for the use of the binding
information as a measure of complexity.Comment: 4 pages, 3 figure
Geometrical methods for non-negative ICA: Manifolds, Lie groups and toral subalgebras
We explore the use of geometrical methods to tackle the non-negative independent component analysis (non-negative ICA) problem, without assuming the reader has an existing background in differential geometry. We concentrate on methods that achieve this by minimizing a cost function over the space of orthogonal matrices. We introduce the idea of the manifold and Lie group SO(n) of special orthogonal matrices that we wish to search over, and explain how this is related to the Lie algebra so(n) of skew-symmetric matrices. We describe how familiar optimization methods such as steepest-descent and conjugate gradients can be transformed into this Lie group setting, and how the Newton update step has an alternative Fourier version in SO(n). Finally we introduce the concept of a toral subgroup generated by a particular element of the Lie group or Lie algebra, and explore how this commutative subgroup might be used to simplify searches on our constraint surface. No proofs are presented in this article
Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders
Supervised multi-channel audio source separation requires extracting useful
spectral, temporal, and spatial features from the mixed signals. The success of
many existing systems is therefore largely dependent on the choice of features
used for training. In this work, we introduce a novel multi-channel,
multi-resolution convolutional auto-encoder neural network that works on raw
time-domain signals to determine appropriate multi-resolution features for
separating the singing-voice from stereo music. Our experimental results show
that the proposed method can achieve multi-channel audio source separation
without the need for hand-crafted features or any pre- or post-processing
Audio Set classification with attention model: A probabilistic perspective
This paper investigates the classification of the Audio Set dataset. Audio
Set is a large scale weakly labelled dataset of sound clips. Previous work used
multiple instance learning (MIL) to classify weakly labelled data. In MIL, a
bag consists of several instances, and a bag is labelled positive if at least
one instances in the audio clip is positive. A bag is labelled negative if all
the instances in the bag are negative. We propose an attention model to tackle
the MIL problem and explain this attention model from a novel probabilistic
perspective. We define a probability space on each bag, where each instance in
the bag has a trainable probability measure for each class. Then the
classification of a bag is the expectation of the classification output of the
instances in the bag with respect to the learned probability measure.
Experimental results show that our proposed attention model modeled by fully
connected deep neural network obtains mAP of 0.327 on Audio Set dataset,
outperforming the Google's baseline of 0.314 and recurrent neural network of
0.325.Comment: Accepted by ICASSP 201
Deep Remix: Remixing Musical Mixtures Using a Convolutional Deep Neural Network
Audio source separation is a difficult machine learning problem and
performance is measured by comparing extracted signals with the component
source signals. However, if separation is motivated by the ultimate goal of
re-mixing then complete separation is not necessary and hence separation
difficulty and separation quality are dependent on the nature of the re-mix.
Here, we use a convolutional deep neural network (DNN), trained to estimate
'ideal' binary masks for separating voice from music, to perform re-mixing of
the vocal balance by operating directly on the individual magnitude components
of the musical mixture spectrogram. Our results demonstrate that small changes
in vocal gain may be applied with very little distortion to the ultimate
re-mix. Our method may be useful for re-mixing existing mixes
Surrey-cvssp system for DCASE2017 challenge task4
In this technique report, we present a bunch of methods for the task 4 of
Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017)
challenge. This task evaluates systems for the large-scale detection of sound
events using weakly labeled training data. The data are YouTube video excerpts
focusing on transportation and warnings due to their industry applications.
There are two tasks, audio tagging and sound event detection from weakly
labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU)
based recurrent neural network (RNN) are adopted as our basic framework. We
proposed a learnable gating activation function for selecting informative local
features. Attention-based scheme is used for localizing the specific events in
a weakly-supervised mode. A new batch-level balancing strategy is also proposed
to tackle the data unbalancing problem. Fusion of posteriors from different
systems are found effective to improve the performance. In a summary, we get
61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the
sound event detection subtask on the development set. While the official
multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the
audio tagging and 1.02 for the sound event detection.Comment: DCASE2017 challenge ranked 1st system, task4, tech repor
- …